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O " Abstract 



The increasing popularity of metaheuristic algorithms has attracted a great 
deal of attention in algorithm analysis and performance evaluations. No-free- 
lunch theorems are of both theoretical and practical importance, while many 
important studies on convergence analysis of various metaheuristic algorithms 
have proven to be fruitful. This paper discusses the recent results on no-free- 
" lunch theorems and algorithm convergence, as well as their important implica- 

CN ■ tions for algorithm development in practice. Free lunches may exist for certain 

types of problem. In addition, we will highlight some open problems for further 
research. 
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1 Introduction 



Metaheuristic algorithms form an important part of contemporary global optimization 
algorithms [1, 2, 5, 45, 26, 31]. Good examples are simulated annealing and particle 
swarm optimization [24, 25]. They work remarkably efficiently and have many ad- 
vantages over traditional, deterministic methods and algorithms, and thus they have 
been applied in almost all area of science, engineering and industry [9, 11, 10, 48]. 

Despite such a huge success in applications, mathematical analysis of algorithms 
remains limited and many open problems are still un-resolved. There are three chal- 
lenging areas for algorithm analysis: complexity, convergence and no-free-lunch the- 
ory. 

Complexity analysis of traditional algorithms such as quick sort and matrix inverse 
are well-established, as these algorithms are deterministic. In contrast, complexity 
analysis of metaheuristics remains a challenging task, as the stochastic nature of 
these algorithms. However, good results do exist, concerning randomization search 
techniques [1]. 

Convergence analysis is another challenging area. One of the main difficulties 
concerning the convergence analysis of metaheuristic algorithms is the lack of a generic 
framework, though substantial studies have been carried out using dynamic systems 
and Markov processes. However, convergence analysis still remains one of the active 
research areas with many encouraging results [6, 39, 30, 17]. 

Along the mathematical analysis of optimization algorithms, another equally chal- 
lenging, and yet fruitful area is the theory on algorithm performance and comparison, 
leading to a wide range of no-free-lunch (NFL) theorems [42, 22]. While in well-posed 
cases of optimization functional space in finite domains, NFL theorems do hold, how- 
ever, free lunches are possible [1, 43, 40]. 

In this paper, we will briefly review and summarize the recent studies of no-free- 
lunch theory and also free lunch scenario. This enable us to view the NLF and free 
lunch in a unified framework, or at least, in a convenient way. We will also briefly 
highlights some of the convergence studies. Based on these studies, we will summarize 
and propose a series of recommendations for further research. 

2 No-Free-Lunch Theorems 

The seminal paper by Wolpert and Mcready in 1997 essentially proposed a frame- 
work for performance comparison of optimization algorithms, using a combination of 
Bayesian statistics and Markov random field theories. 

Along many relevant assumptions in proving the NFL theorems, two fundamental 
assumptions are: finite states of the search space (and thus the objective values), and 
the non-revisiting time-ordered sets. 

The first assumption is a good approximation to many problems, especially in 
finite-digit approximations. However, there is mathematical difference in countable 
finite, and countable infinite. Therefore, the results for finite states/domains may not 
directly applicable to infinite domains. Furthermore, as continuous problem are un- 



countable, NFL results for finite domains will usually not hold for continuous domains 
[!]■ 

The second assumption on non-revisiting iterative sequence is an over-simplification, 
as almost all metaheuristic algorithms are revisiting in practice, some points visited 
before will possibly be re- visited again in the future. The only possible exception is the 
Tabu algorithm with a very long Tabu list [13]. Therefore, results for non-revisiting 
time-ordered iterations may not be true for the cases of revisiting cases, because the 
revisiting iterations break an important assumption of 'closed under permutation' 
(c.u.p) required for proving the NFL theorems [28]. 

Furthermore, optimization problems do not necessarily concern the whole set of all 
possible functions/problems, and it is often sufficient to consider a subset of problems. 
It is worth pointing out active studies have carried out in constructing algorithms that 
can work best on specific subsets of optimization problems, in fact, NFL theorems do 
not hold in this case [8]. 

Before we go further to discuss more about any possible free lunches, let us sketch 
Wolpert and Macready's original proof. Assuming that the search space is finite 
(though quite large), thus the space of possible objective values is also finite. This 
means that objective function is simply a mapping / : X i— > y, with J 7 = y x as the 
space of all possible problems under permutation. 

As an algorithm tends to produce a series of points or solutions in the search 
space, it is further assumed that these points are distinct. That is, for k iterations, k 
distinct visited points forms a time-ordered set 

= {(n x k (i),nm),..., (n x k (k),nm)}. (i) 

There are many ways to define a performance measure, though a good measure 
still remains debatable [35]. Such a measure can depend on the number of iteration k, 
the algorithm a and the actual cost function /, which can be denoted by P(f^||/, k, a). 
Here we follow the notation style in seminal paper by Wolpert and Mcready (1997). 
For any pair of algorithms a and 6, the NFL theorem states 

£p(^|/,M) = E p (^l/,M- (2) 
/ / 

In other words, any algorithm is as good (bad) as a random search, when the perfor- 
mance is averaged over all possible functions. 

Wolpert and Macready's original proof was carried out by induction. Using a 
similar methodology and similar assumptions, other forms of NFL theorems can also 
be proved. These theorems are vigorous and thus have important theoretical values. 
However, their practical implications are a different issue. In fact, it may not be so 
important in practice anyway, we will discuss this in a later section. 



3 Free Lunch or No Free Lunch 



3.1 Continuous Free Lunches 

The validity of NFL theorems largely depends on the validity of their fundamental 
assumptions. However, whether these assumptions are valid in practice is another 
question. Often, these assumptions are too stringent, and thus free lunches are pos- 
sible. 

One of the assumptions is the non-revisiting nature of the k distinct points which 
form a time-ordered set. For revisiting points as they do occur in practice in real- 
world optimization algorithms, the 'closed under permutation' does not hold, which 
renders NFL theorems invalid [33, 28]. This means free lunches do exist in practical 
applications. 

Another basic assumption is the finiteness of the domains. For continuous do- 
mains, Auger and Teytaud in 2010 have proven that the NFL theorem does not 
hold, and therefore they concluded that "continuous free lunches exist". Indeed, 
some algorithms are better than others. For example, for a 2D sphere function, they 
demonstrated that an efficient algorithm only needs 4 iterations/steps to reach the 
global minimum. 

3.2 Coevolutionary and Multiobjective Free Lunches 

The basic NFL theorems concern a single agent, marching iteratively in the search 
space in distinct steps. However, Wolpert and Mcready proved in 2005 that NFL 
theorems do not hold under coevolution. For example, a set of players (or agents) 
in self-play problems can work together so as to produce a champion. This can be 
visualized as an evolutionary process of training a chess champion. In this case, free 
lunch does exist [43]. It is worth pointing out that for a single player, it tries to 
pursue the best next move, while for two players, the fitness function depend on the 
moves of both players. Therefore, the basic assumptions for NFL theorems are no 
longer valid. 

For multiobjective optimization problems, things have become even more compli- 
cated. An important step in theoretical analysis is that some multiobjective optimiz- 
ers are better than others as pointed out by Corne and Knowles [7]. One of the major 
reasons is that the archiver and generator in the multiobjective algorithms can lead 
to multiobjective free lunches. 

Whether NFL holds or not, it has nothing to say about the complexity of the 
problems. In fact, no free lunch theorem has not been proved to be true for problems 
with NP-hard complexity [41]. 

4 Practical Implications of NFL Theorems 

No-free-lunch theorems may be of theoretical importance, and they can also have 
important implications for algorithm development in practice, though not everyone 
agrees the real importance of these implications. 



There are three kinds of opinions concerning the implications. The first group may 
simply ignore these theorems, as they argue that the assumptions are too stringent, 
and the performance measures based on average overall functions are irrelevant in 
practice. Therefore, no practical importance can be inferred, and research just carries 
on. 

The second kind is that NFL theorems can be true, and they can accept that 
the fact there is no universally efficient algorithm. But in practice some algorithms 
do performance better than others for a specific problem or a subset of problems. 
Research effort should focus on finding the right algorithms for the right type of 
problem. Problem-specific knowledge is always helpful to find the right algorithm(s). 

The third kind of opinion is that NFL theorems are not true for other types 
of problems such as continuous problems and NP-hard problems. Theoretical work 
concerns more elaborate studies on extending NFL theorems to other cases or on 
finding free lunches [1]. On the other hand, algorithm development continues to design 
better algorithms which can work for a wider range of problems, not necessarily all 
types of problems. As we have seen from the above analysis, free lunches do exist, 
and better algorithms can be designed for a specific subset of problems [46, 49]. 

Thus, free lunch or no free lunch is not just a simple question, it has important 
and yet practical importance. There is certain truth in all the above arguments, and 
their impacts on optimization community are somehow mixed. Obviously, in reality, 
the algorithms with problem-specific knowledge typically work better than random 
search, and we also realized that there is no universally generic tool that works best for 
all the problems. Therefore, we have to seek balance between speciality and generality, 
between algorithm simplicity and problem complexity, and between problem-specific 
knowledge and capability of handling black-box optimization problems. 

5 Convergence Analysis 

For convergence analysis, there is no mathematical framework in general to provide 
insights into the working mechanism, the complexity, stability and convergence of 
any given algorithm [19, 38]. Despite the increasing popularity of metaheuristics, 
mathematical analysis remains fragmental, and many open problems concerning con- 
vergence analysis need urgent attention. 

5.1 PSO 

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart in 1995 
[24], based on the swarm behaviour such as fish and bird schooling in nature. Since 
then, PSO has generated much wider interests, and forms an exciting, ever-expanding 
research subject, called swarm intelligence. PSO has been applied to almost every 
area in optimization, computational intelligence, and design/scheduling applications. 

The movement of a swarming particle consists of two major components: a 
stochastic component and a deterministic component. Each particle is attracted 



toward the position of the current global best g* and its own best location x* in 
history, while at the same time it has a tendency to move randomly. 

Let Xi and Vi be the position vector and velocity for particle i, respectively. The 
new velocity and location updating formulas are determined by 

vl +1 =v\ + oeifo* - afl + Pe 2 [x* - x% (3) 

xf^x\ + vf\ (4) 

where ei and e 2 are two random vectors, and each entry taking the values between 
and 1. The parameters a and (3 are the learning parameters or acceleration constants, 
which can typically be taken as, say, a « f3 2. 

There are at least two dozen PSO variants which extend the standard PSO al- 
gorithm, and the most noticeable improvement is probably to use inertia function 
6{t) so that v\ is replaced by 9{t)v\ where 9 e [0, 1]. This is equivalent to introduc- 
ing a virtual mass to stabilize the motion of the particles, and thus the algorithm is 
expected to converge more quickly. 

The first convergence analysis of PSO was carried out by Clerc and Kennedy 
in 2002 [6] using the theory of dynamical systems. Mathematically, if we ignore the 
random factors, we can view the system formed by (3) and (4) as a dynamical system. 
If we focus on a single particle i and imagine that there is only one particle in this 
system, then the global best g* is the same as its current best x*. In this case, we 
have 

^ +1 = ^ + 7(0*-*-), 7 = a + {3, (5) 

and 

x^=x\ + v^. (6) 

Considering the ID dynamical system for particle swarm optimization, we can replace 
g* by a parameter constant p so that we can see if or not the particle of interest will 
converge towards p. By setting u t = p — x(t+l) and using the notations for dynamical 
systems, we have a simple dynamical system 

v t +i = v t + >yu t , u t+1 = -v t + (1 - i)ut, (7) 



The general solution of this dynamical system can be written as Y t = Yoexp[At]. The 
system behaviour can be characterized by the eigenvalues A of A 

2 2 

It can be seen clearly that 7 = 4 leads to a bifurcation. 

Following a straightforward analysis of this dynamical system, we can have three 
cases. For < 7 < 4, cyclic and/or quasi-cyclic trajectories exist. In this case, when 
randomness is gradually reduced, some convergence can be observed. For 7 > 4, 



(9) 



non-cyclic behaviour can be expected and the distance from Y t to the center (0, 0) is 
monotonically increasing with t. In a special case 7 = 4, some convergence behaviour 
can be observed. For detailed analysis, please refer to Clerc and Kennedy [6]. Since 
p is linked with the global best, as the iterations continue, it can be expected that all 
particles will aggregate towards the the global best. 

5.2 Firefly Algorithm 

Firefly Algorithm (FA) was developed by Yang [45, 47], which was based on the 
flashing patterns and behaviour of fireflies. In essence, each firefly will be attracted 
to brighter ones, while at the same time, it explores and searches for prey randomly. 
In addition, the brightness of a firefly is determined by the landscape of the objective 
function. 

The movement of a firefly % is attracted to another more attractive (brighter) 
firefly j is determined by 

Xi = Xi + (3 e~ ir ^ (Xj -Xi) + a e h (10) 

where the second term is due to the attraction. The third term is randomization 
with a being the randomization parameter, and e$ is a vector of random numbers 
drawn from a Gaussian distribution or uniform distribution. Here is /3q G [0, 1] is 
the attractiveness at r = 0, and the Cartesian distance. For 

other problems such as scheduling, any measure that can effectively characterize the 
quantities of interest in the optimization problem can be used as the 'distance' r. 
For most implementations, we can take (3$ = 1, a = 0(1) and 7 = 0(1). It is 
worth pointing out that (10) is essentially a random walk biased towards the brighter 
fireflies. If Pq = 0, it becomes a simple random walk. Furthermore, the randomization 
term can easily be extended to other distributions such as Levy flights. 

We now can carry out the convergence analysis for the firefly algorithm in a 
framework similar to Clerc and Kennedy's dynamical analysis. For simplicity, we 
start from the equation for firefly motion without the randomness term 

x^ = x\ + ^e-^.- x \). (11) 

If we focus on a single agent, we can replace cc*- by the global best g found so far, and 
we have 

x^^xl + (3 e-^(g-xl), (12) 

where the distance can be given by the £ 2 -norm rf = | \g — x\\ ||. In an even simpler 
1-D case, we can set y t — g — x\, and we have 

y t+ i = yt - Poe-^yt- (13) 

We can see that 7 is a scaling parameter which only affects the scales/size of the 
firefly movement. In fact, we can let u t = y/jyt and we have 



u t+1 =ut[l- /3 e **]. 



(14) 



These equations can be analyzed easily using the same methodology for studying the 
well-known logistic map 

u t = \u t {l - u t ) . (15) 

Straightforward analysis can show that convergence can be achieved for /3 < 2. 
There is a transition from periodic to chaos at (3q ~ 4. This may be surprising, as 
the aim of designing a metaheuristic algorithm is to try to find the optimal solution 
efficiently and accurately. However, chaotic behaviour is not necessarily a nuisance, 
in fact, we can use it to the advantage of the firefly algorithm. Simple chaotic char- 
acteristics from (15) can often be used as an efficient mixing technique for generating 
diverse solutions. Statistically, the logistic mapping (15) for A = 4 for the initial 
states in (0,1) corresponds a beta distribution 

B{u ^ q) = v§m uP ' 1{1 - ur ^ (16) 

when p = q = 1/2. Here T(z) is the Gamma function 

T(z) = / t^e-'dt. (17) 
Jo 

In the case when z = n is an integer, we have T(n) = (n — 1)!. In addition, 
T(l/2) = y/n. From the algorithm implementation point of view, we can use higher 
attractiveness (3 during the early stage of iterations so that the fireflies can explore, 
even chaotically, the search space more effectively. As the search continues and conver- 
gence approaches, we can reduce the attractiveness (3 gradually, which may increase 
the overall efficiency of the algorithm. Obviously, more studies are highly needed to 
confirm this. 



5.3 Markov Chains 

Most theoretical studies use Markov chains/process as a framework for convergence 
analysis. A Markov chain is said be to regular if some positive power k of the transition 
matrix P has only positive elements. A chain is ergodic or irreducible if it is aperiodic 
and positive recurrent, which means that it is possible to reach every state from any 
state. 

For a time- homogeneous chain as k — > oo, we have the stationary probability 
distribution n, satisfying 

7T = 7rP, (18) 

thus the first eigenvalue is always 1. This will lead to the asymptotic convergence to 
the global optimality 9*: 

lim 9 k ->• 0„ (19) 

k— s-oo 

with probability one [12, 30, 17]. 

Now if look at the PSO closely using the framework of Markov chain Monte 
Carlo, each particle in PSO essentially forms a Markov chain, though this Markov 
chain is biased towards to the current best, as the transition probability often leads 



to the acceptance of the move towards the current global best. Other population- 
based algorithms can also be viewed in this framework. In essence, all metaheuristic 
algorithms with piecewise, interacting paths can be analyzed in the general framework 
of Markov chain Monte Carlo. The main challenge is to realize this and to use the 
appropriate Markov chain theory to study metaheuristic algorithms. More fruitful 
studies will surely emerge in the future. 



5.4 Convergence of SA 

Simulated annealing and generalize hill-climber algorithms were among the first algo- 
rithms with important results on convergence analysis [29, 14, 23, 44, 37]. The main 
idea is to consider simulated annealing as sequence of homogeneous Markov chains or 
a long single inhomogeneous Markov chain [20]. Under weak ergodic conditions, the 
temperature T k can be reduced sufficiently slow to zero by 

T k > r 4rr, Jim T k -+ 0, (20) 
log (A;) k^oo 

where A is a constant 



5.5 Convergence of GA 



One of the well-studied and most popular algorithms is the class of genetic algorithms 
[21, 26]. Earlier seminal papers proved the convergence of genetic algorithms [18, 32]. 
Important studies on convergence analysis of GA have been carried out by Aytug et 
al. [3, 4], Greenhalgh and Marshal [15], Gutjahr [16, 17]and others. 

For example, a well-known result is that the number of iterations t(() in GA with 
a convergence probability of ( can be estimated by the upper limit 



*(C)< 



ln(l - C) 



ln <{ 1 - min[(l - /i) in , fx Ln ^ 



where parameter jj, is the mutation rate in genetic algorithms. L and n are the string 
length and population size, respectively [3]. These results are further elaborated by 
others [15]. 



5.6 Multiobjective Metaheuristics 

Convergence analysis for single-objective optimization tends to be challenging, this 
complexity is further complicated by the Pareto optimality of multiobjective opti- 
mization. Despite these challenges, asymptotic convergence of metaheuristic, mul- 
tiobjective optimization has been proved by Villalobos-Arias et al. (2005) using a 
framework of Markov chains [40]. They proved that the transition matrix P of a 
metaheuristic algorithm can have a stationary distribution n such that 



P'-njl < (l-C)*-\ V*,j, (A; = 1,2,...), 



where ( is a function of mutation probability /i, string length L and population size 
n. For example, when the population can be divided into two sets with mutation 
rates and population sizes n, rii, respectively, this ( function becomes 

c = 2 n V lL /4 ra ~ rai)L - (21) 

They demonstrated that an algorithm satisfying this condition may not converge 
for multiobjective optimization problems, however, an algorithm with elitism indeed 
converges under the above conditions. 

5.7 Other results 

Limited results on convergence analysis exist, concerning finite domains, ant colony 
optimization [16, 34], cross-entropy optimization, best-so-far convergence [27, 17], 
nested partition method, Tabu search, and largely combinatorial optimization. How- 
ever, more challenging tasks for infinite states/domains and continuous problems. 
Many, many open problems need satisfactory answers. 

On the other hand, it is worth pointing out that an algorithm can converge, but it 
may not be efficient, as its convergence rate could be typically low. One of the main 
tasks in research is to find efficient algorithms for a given type of problem. 

6 Open Problems 

Active research on NFL theorems and algorithm convergence analysis has led to many 
important results. Despite this, many crucial problems remain unanswered. These 
open questions span a diverse range of areas. Here we highlight a few but relevant 
open problems. 

Framework: Convergence analysis has been fruitful, however, it is still highly 
needed to develop a unified framework for algorithmic analysis and convergence. 

Exploration and exploitation: Two important components of metaheuristics are 
exploration and exploitation or diversification and intensification. What is the opti- 
mal balance between these two components? 

Performance measure: To compare two algorithms, we have to define a measure for 
gauging their performance [36]. At present, there is no agreed performance measure, 
but what are the best performance measures ? Statistically? 

Free lunches: No-free-lunch theorems have not been proved for continuous do- 
mains for multiobjective optimization. For single-objective optimization, free lunches 
are possible, is this true for multiobjective optimization? In addition, no free lunch 
theorem has not been proved to be true for problems with NP-hard complexity (Whit- 
ley and Watson 2005). If free lunches exist, what are their implications in practice 
and how to find the best algorithm(s)? 

Knowledge: Problem-specific knowledge always helps to find an appropriate solu- 
tion? How to quantify such knowledge? 

Intelligent algorithms: A major aim for algorithm development is to design better, 
intelligent algorithms for solving tough NP-hard optimization problems. What do 



mean by 'intelligent'? What are the practical ways to design truly intelligent, self- 
evolving algorithms? 
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